An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
نویسنده
چکیده
We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. Probabilities (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.
منابع مشابه
Parsing with Probabilistic Grammars
This paper describes some recent developments in the area of natural language parsing. Probabilistic Grammars are, in principle, grammars enriched with probabilities to distinguish between probable and improbable analyses of a sentence. The first part of the paper introduces the notation of Probabilistic Context-Free Grammars, together with a general algorithm for PCFG top-down parsing and some...
متن کاملGeneralized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...
متن کاملProceedings of the 9 th International Workshop Finite State Methods and Natural Language Processing
The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...
متن کاملIntersection for Weighted Formalisms
The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...
متن کاملDeterministic Parsing using PCFGs
We propose the design of deterministic constituent parsers that choose parser actions according to the probabilities of parses of a given probabilistic context-free grammar. Several variants are presented. One of these deterministically constructs a parse structure while postponing commitment to labels. We investigate theoretical time complexities and report experiments.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 21 شماره
صفحات -
تاریخ انتشار 1995